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Abstract: We study Rademacher processes where the coefficients are 
functions evaluated at fixed, but arbitrary covariables. Specifically, we as- 
sume the function class under consideration to be parametrized by the stan- 
dard cocube in I dimensions and we are mainly interested in the high- 
dimensional, asymptotic situation, that is, I as well the number of Rademacher 
variables n go to infinity with I much larger than n. We refine and apply 
classical entropy bounds and Majorizing Measures, both going back to the 
well known idea of chaining. That way, we derive general upper bounds for 
Rademacher processes. In the linear case and under high correlations, we 
further improve on these bounds. In particular, we give bounds independent 
of I for highly correlated covariables. 
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1 Introduction 



We study upper bounds for the quantity 



n 



Esup \ €i(fie(xi) 



(1) 



with := {6 G M. 1 : \\6\\i < M}, i.i.d. Rademacher variables 6j and real valued functions 
evaluated at fixed but arbitrary X{. We are mainly interested in the high- dimensional, 
asymptotic situation, i.e., I 3> n and n, I — Y oo and we treat a general setting, the linear 
case as well as a setting involving strongly correlated xi. We show in particular that 
strong correlations can lead to better asymptotic bounds. 

Chaining is the main tool for our investigations. For an arbitrary process {Z\ : A G A} 
it means the following: instead of studying terms of the form \Z\ — Z\>\ for (possibly 
very distinct) random variables Z\, Zy directly, one applies the triangular inequality 



and studies the increments \Z\ n — Zx n _ 1 \, where A„, A„_i G A, Ao = A and A m = A'. 
Usually, the Z\ , ...,Z\ m are constructed such that Z\ — Zy can be thought of as the 
sum of the small "chain links" Z\ n — Z\ n X . It's often easier to control these chain 
links than to control Z\ — Zy directly. This approach leads to two general bounds 
for empirical processes. On the one hand, there is the classical "Entropy Bound" (see 
for example [Tal05| . |vdVW00] and references therein). Its integral version as stated 
in |vdVW 00j is introduced and refined at the beginning of the second part. Then, we 
apply this bound to the problem stated above where we follow ideas given in [Car85] 
for some entropy calculations. On the other hand, there are "Majorizing Measures" (see 
for example [RT88] . [Tal94] and |Tal96j ). They are introduced and applied in the third 
part. Majorizing Measures are rather difficult to use, however, we show that for highly 
correlated covariables they can lead to substantially better results. 

We conclude this section with some notation and the main results. 

Notation: For a pseudometric space (S, d) with unit ball B we denote the covering 
numbers by N(S, d, e), i.e., N(S, d, e) is the number of translates of eB needed to cover 
S. The logarithm of the covering numbers (as a function of e) is called entropy. We 
define similarly D(S, d, e) as the maximal number of e-separated points in S. Obviously, 
N(S,d,e) < D(S,d,e) < N(S,d,^). And finally, if the pseudometric is induced by a 
seminorm, we occasionally write N(S, \\ ■ \\,e) or D(S, \\ ■ ||,e). 

We are mainly interested in the pseudometric space (O, d) with d(9, 6') := ||(0e(xi) — 
(pg'(xi), ...,<pg(x n ) — 4>e'{x n )) T \\2, where x := (xi, ■■■,x n ) G X n for an arbitrary set X and 



ni 
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{(fie : X — > R : 9 G 6} is a set of functions and we define Xg(x) := J2i=i e i4>e( x i) f° r 
simplicity. The choice for the pseudo metric d is motivated by the fact that {Xg(x) : 9 G 
6} is sub-Gaussian with respect to d due to Hoeffding's inequality, that is 

P( |x„_^>„)< 2e xp(-^) vw.e. 

In other words, the tail behavior is as for Gaussian processes. 

Main Results: We derive upper bounds for the quantity (CEJ) under three different sets 
of assumptions. We are not aware of equally sharp bounds in the literature. 

In Section I2.2[ we derive a bound under the assumption that {(fig : X — > R : 9 G 9} 
has a certain contraction property: 

Theorem 1.1. If there exists a function A : X n — > R fulfilling 

d(e,e') < V^A(x)\\e-9'\\ 2 vm' e © (2) 

£/ien i/iere zs a universal constant K such that for 9q G arbitrary 



Esup|X e (x)| < E|X eo (x)| +ir v / ^ 1 °g( z + 1) log(n + l)A(x)M. (3) 

eee 

In the linear case, the log(n + 1) in (jSJ) can be omitted and the contraction property 
(J2J) can be relaxed. This is stated in the following theorem we prove in Section 12.31 

Theorem 1.2. Let ifij : X — > R be arbitrary functions for j = 1,...,/. If (f)$(xi) = 
Ej=i ^j( x i)°j and ifA:X n ^R fulfills 

d(6,0) < y/nA(x)M V^9 

there is a universal constant K such that 



Esup \X e (x)\ < K^nhg(l + l)A{x)M. 

For strongly correlated covariables, we can improve on these bounds. We show this in 
Section [3721 with the help of Majorizing Measures. To state the result, we let X' G R nx ' , 
X" G M. nxl . Furthermore, we denote the z-th row of X' (X" resp.) by x\ (x'- resp.), the 
columns by y[ (y'/ resp.) and we set 9 = (9', 9"). We then impose the usual normalization 
on the matrices, that is ||^||2 = Wvi'lU = and state the following result: 

Theorem 1.3. Let g : R 2 — > R be a contraction w.r.t. the Euclidean metric. If there 
are orthogonal matrices R', R" such that for all i 

3=1 i=i 
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then there is a universal constant K such that for 6$ G O arbitrary 



E sup 

0ee 



1=1 



< E 



e' , [x 



+ Ky/nlog{n + 1)M. 



So, the factor a/ n log(Z + 1) log(n+l) in the bound ([3]) can be replaced by yjn \og{n + 1) 
in this case. The required correlation is expressed by assumption 01]): It means, that the 
columns of the matrices X' and X" can be enveloped by small ellipsoids. The matrices R' 
and R" are the transformations that bring these ellipsoids on the standard form. 



2 Entropy Bounds 

In this part, we introduce entropy bounds and apply them to Rademacher processes. In 
the first section, we prove adapted versions of two classical entropy results. The second 
and the third sections are devoted to the proofs of Theorem 11.11 and Theorem 11.21 and a 
simple example. 



2.1 Refinement of Entropy Bounds 

Here, we introduce slightly modified versions of two classical entropy bounds for empiri- 
cal processes (see e.g. |vdVW00] Theorem 2.2.4 and Corollary 2.2.8). The modification 
is the lower bound for the integration. For convenience, we give the proofs in detail, 
although they follow closely the ones given in [vdVWOOj. 

Beforehand, we recall the definition of the Orlicz norm for a non- decreasing and 

convex function \I/ with ^(0) = 0: 

11X11* := mi{A > : E^ f S ) < l}. 



x A r 

We are then able to formulate and prove an important entropy bound: 

Lemma 2.1. Let $ : R -> R k « convex, non- decreasing and non-constant function 
with \&(0) = and 

^(x)^(y) 
hm sup — — — < oo 

x , y ^oo ^{cxy) 

for a constant c. Define ^(oo) := oo ; ^^(y) := sup{a; : *&(x) < y} and assume 
(1) > 0. Furthermore, let {X t : t G T} be a stochastic process with 

\\Xs-XtWn, < Cd(s,t) Vs,t G T 

and 

\X S -X t \ < ad(s,t) Ws,t G T (5) 
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for a pseudometric d and positiv constants C and a. Then there are universal functions 
K = K(C, *) and U = U(C, a) such that for all < rj < 5 



sup \X. - X t \\\v < K [ I ^- 1 (D(T,ci,e))rfe + ^- 1 ( J D 2 (T,d,7 ? )) ] . (6) 

d(s,t)<6 



Comparing this to |vdVW00] . note that we introduced the additional condition (J5j). 
This is to establish the lower integral bound in the inequality (EJ). 

Proof. We may assume that the covering numbers for e > jj and the corresponding 
integral in are finite since the inequality is trivial otherwise. We then fix r] G M + and 
k G N and construct nested sets T C T\ C ... C T fc+1 C T such that for every j < k + 1 
Tj is maximal w.r.t. d(s,t) > 7/2~ J for all s,iG 7}. 

According to the definition of covering numbers, it holds that \Tj\ < D(T,d,r]2~^). We 
will assume £72~( fe+1 ) > 1 (U will be defined later) and hence finitely many elements 
in every set, this will be justified later. Now, we will assign each point tj + \ G T J+1 
to a unique point tj G Tj such that d(tj + i,tj) < Vjl^K In this way, we define for all 
tk+i G Tfe + i chains tk+i i-> ••• ^ to £ T and use the notation c{tk+\) ■= {tk+i, ■■■,to}. 
Let Sfc+i , tfc+i G Tfc + i. We then get for elements of these chains 



k k 



\(x Sk+1 - x SQ ) - (x th+1 - x to )\ = 1 2j(x % . +1 - x Sj ) - ^(x 4j+1 - x, 

3=0 3=0 
k k 

3=0 3=0 
k 

< 2 max{|X u — X v \ : w G Tj+i, f G Tj R c(w)}. 

3=0 

Applying Lemma 2.2.2 of |vdVW00] . we find a constant K depending on \l/ only such 
that 

II max | (AT Sfc+1 - X so ) - (X tk+1 -X to )|||^ 

k 

<2 II max{|X M - X v \ : u G T i+1 ,w G 1} n c(«)}||* 

3=0 

<2X^^- 1 (|T i+1 |)max{||X u - Xj* : u G G T^cfu)} 

3=0 
fc+1 

<2KC^^ l {D{T,d,r]2^))r]2- i+l 

i=i 
-a 

<8XC / 2 1 i r-1 (D(T, d, e))cfe. 
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In the first line, the maximum is taken over all Sk+i,tk+i € Tfc+i and their associated 
points in T . We then note that for 5 > rj 

|| max{|X s -X t \: s,t e T k+X : d(s,t) < 5}\\y 
<|| max{|(X s -XJ - (X t -X t0 )\ : s,t E T k+1 : d(s,t) < 5}\\ 9 
+ ||max{|X S0 -X to \ : s ,t E T ,s,t E T k+1 ,s E c(s),t E c(t)}||*. 

The first term on the r.h.s. of the last display is bounded according to what we have 
done above. The second term may be rewritten using 

\X So ~ X to\ <IP^o _ X s k+1 ) - (Xto - X t k+1 ) \ + \X Sk+1 - X tk+1 \. 

Here, we assign to each s E T and each to E T a fixed s k +i E T k+ i, t k+ i E T k+ i respec- 
tively, such that so E c(s) and to £ c(t). We demand furthermore, that d(sk+i,tk+i) < 8. 
This yields together with Lemma 2.2.2 of [vdVWOO] 

|| max{|X s - X t \ : s,t E T k+1 ,d(s,t) < 5}\\n, 

<IQKC I V- 1 (D(T,d,e))de+ || max |X Sfc+1 -X t |||* 

J rj 2~(k+2) 

<1QKC I ^- l (D(T,d,e))de + K^- 1 (D 2 (T,d,r ] ))m a x\\X Sk+1 - X tk+1 \\ 9 

J v 2-(k+2) 

<16KC I ^-\D{T } d,e))de + KC5^-\D 2 {T,d,ri)). 

J v 2-{k+2) 

The maximum in the second line is taken as described above. We then note that 
|| sup |X s -X t |||*=|| sup \(X s -X s *)-(X t -X t *) + (X s *-X t *)\\\y 

d(s,t)<5 d(s,t)<S 

<2|| sup \X S — X a *\\\is, + || max{|X, — X t \ : s,t E T k+1 ,d(s,t) < 35}\\y 

where we define s* := arg min s / gTfe+i d(s', s) and t* := argmin 4 / gTfc+i d(t',t) and use 

d(s*,t*) < d(s*, s) + d(s, t) + d(t, t*) < 35. 

We find moreover 

|| sup \X S = inf{A > : E#(sup \X S - X s * \/A) < 1} 

seT seT 

ar/2"( fc+1 ) 

We may assume w.l.o.g. that T is not empty and C, K > 0. So there is a k E N 
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(depending only on \I> and a) such that 9 %=T7]j- < -j^ Then, 



$-i(l) - 4 v ; 

< iTC|(l - 2-( feo+1 ))$- 1 (l) 

<KC I ^- l {D{T,d,e))de. 
We define C/ := 2 ko+2 to conclude the proof. □ 



Because we often do not need the generality of Lemma 12. 1[ we derive in the following 
a result for the important special case of sub-Gaussian processes: 

Lemma 2.2. Let {X t : t G T} be a sub-Gaussian process w.r.t. a pseudometric d such 
that 

\X„-X t \ < ad(s,t) Vs,t 6 T 

for a constant a. Then there exists a function U = U(a) and a universal constant K 
such that for all 5 > and t G T arbitrary 

E sup \X t \ <E\X t0 \+K I" >/log(l + D{T,d,e))de. (7) 

t:d{t,to)<8 i| 

Proof. We apply Lemma I2TT1 to \I/(x) := e x — 1. The function $ is convex and increasing 
and $(0) = 0. It holds that 

hm sup — — r — < oo 

and 

\\X S -X t \\ 9 < VEd(s,t) Vs,t G T. 



So, the conditions of Lemma 12.11 are met. We then set rj = 5 in Lemma 12.11 and note 
that 

m^im 2 ) = y/\og(l + m 2 ) < ^/\og(l + m) 2 = V2^ l (m). 
So there is a universal constant K' such that (recall that U > 4, cf. proof of Lemma l2~Tj) 



, s 

\X a -X t \\\ 9 < K' r A/log(l + D(T, d,e))de. 

J u 

Since A/log 2 • E|X| < ||X||# for any random variable X, there is a constant K such that 



sup 

s,t:d(s,t)<5 



E sup \X S - X t \ < K / 2 y/hg(l + D{T, d,e))de. 

s,t:d(s,i)<6 J £ 



s,t:d(s,t)<<5 

We conclude the proof by noting that for any t 



E sup \X t \ — E\X to \ < E sup \X S -X t \. 

t:d(t,t )<8 s,t:d(s,t)<5 



□ 
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2.2 Proof of Theorem 11.11 

The proof of Theorem 11.11 has two main ingredients: First, the entropy bound of 
Lemma 12.11 and second, some subtle entropy estimates. For the entropy estimates, 
we rely on ideas given in Lemma 1 of [Car85j. 

Proof of Theorem \l.l[ To simplify the notation, we set X$ := Xg(x) and A := A(x). We 
then note that, as a consequence of Hoeffding's inequality, {X e : 9 G 0} is sub-Gaussian 
with respect to the pseudometric 

d(9,9') := \\(</>e{xi) - (f>e'(xi), ...,<f>e{x n ) - <Pe'{x n )) T \\ 2 . 

We find that 

\X e -X v \< Vnd{6, 9') < nA\\9 - 9'\\ 2 V0, 9' G 0. (8) 

Now, we want to calculate the entropy linked with the stochastic process and the pseu- 
dometric d. To this end, we define 

V:= {ei,...,e 2l } C R l 

using the notation (ej)j := for i < I, where 6ij is the Kronecker symbol, and : = 
-e 2Z _ i+ i for i > I. So is the set {9 G R l : 3A G K 2/ ,||A||i < M, 9 = EjiiVi}- 
We then fix a A G 1R 2/ such that ||A||i < M. Define independent random variables 
Y h Y k G V U with (following |Uar85j ) 

IA 



F(Y i = e j ) = ^-Vi = l,...,k,j = l,...,2l 



and 



We obtain 



21 i . 

A, 



PW = 3) = l-£^f 



M 



21 



EYi = — V|A 3 -|e,- G Vi. 
i=i 

Next, we set Y k := | X)i=i ^ e ®- ^ ne ma y check that 



A- 

using the contraction property (J2J). So, the distance of at least one realization of MY k 
to MWYi is smaller or equal to 2^J^AM. For the (at most ( 2l+ k ~ 1 )) realizations of 

MY k and ME7i it holds that V6> G O 3A : ||A||i < M,9 = Hence, using 

Stirling's inequalities, we get 
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Therefore, 

4nM 2 A 2 | -, 

/ e/e 2 \ <= 2 
when we choose k := [" 4wJ ^ 2 ^ 2 "| . Consequently, 

16nM 2 A 2 i -i 

/ e/e 2 \ — 3 — + 

We may now use Lemma 12.11 and get for a universal constant K and a constant U 
depending only on y/n (see condition (j^J and inequality (jHJ)) 

p^/nAM 

Esup |X e | - E|XJ < X / \/log(l + £>(6,c/,e))c/e. 

Regarding the last part of the proof of Lemma 12.11 we find a universal constant V such 
that U = y/nV. The results then follows by a simple calculation. □ 

2.3 Proof of Theorem 11.21 and an Example 

In the linear case, we can get rid of one of the logarithms. This is because we can 
transform the parameter space into a lower dimensional one. We note that in the proof 
of this lemma, the lower bounds for the integrals in Lemma 12.11 and Lemma 12.21 are not 
necessary. Additionally, no difficult entropy estimates have to be made. 

Proof of Theorem \l.S[ Again, we set Xq := Xq{x) and A := A(x) and note that 

sup \X e \ = sup \9 T a\ 
eee eee 

with a := EiV'iO^i)) X^ILi e i' l l ; l( x i)) T £ The map 9 — > \9 T a\ attains its 

maximum on at 9q where (9o)i '■= M5i P with p such that \a p \ > \a m \ for all m — 1, I. 
So we have 

E sup \Xq I = E sup \Xq\ 
eee eee' 

for 6' := {(M, 0, 0) T , (0, 0, M) T , (0, 0) T }. As a consequence of Hoeffding's 
inequality, {Xo : 9 G ©'} is sub-Gaussian with respect to the pseudometric d(9, 9') := 
|| {4>e{xi) — (po'(xi), 4>e(x n ) — 0e'( x n)) T ||2 and it holds for all 9, 9' that d(9, 0) < y/nM. 
Hence, according to Lemma [2.11 we get for a universal constant K 

i-^/nAM 

Esup \X e \ < K / \/\og{l + D(Q',d,e))de. 
eee Jo 

The result follows then using D(Q', d, e) < |G'| = I + 1. □ 
Finally, we give a simple application: 
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Example 2.1. Let X G IR be normalized such that the columns have Euclidean norm 
■sjn. Moreover, define e := (ei,...,e n ) with Rademacher variables e^. Then, for X e := 
e T X8, 9 G O = {8 G M. 1 : \\6\\i < M, t/iere zs a universal constant K such that 

Esup \X e \ < K^nlog (l + l)M. 

3 The Majorizing Measures Bound 

In this part, we recall the Majorizing Measures Bound and some consequence such as 
the Ellipsoid Theorem. We then apply these tools to prove Theorem 11.31 

3.1 Majorizing Measures 

Majorizing Measures are known to work well in situations where we have unit balls of 
p-convex Banach spaces as index sets (see |GMPTJ08j for an example and |Pis89j or 
[LT79j for the definitions of p-convexity, p-type and related terms). Here, we recall the 
most important bounds arising in this scope. For the proofs and more detailed intro- 
ductions we refer to |RT88] . |Tal94] and |Tal96] . 

We begin with a basic definition: 
Definition 3.1. Let (T,d) be a metric space and j3 > 0. We set 

M T, d) ~ mi {sup (f (log ^-^) 1 *) ' } , 

where B(d,t,e) is the ball w.r.t. d around t with radius e and the infimum is taken over 
all probability measures [i on the Borel-a -algebra of T. 

We then recall the following bounds: 

Lemma 3.1. (The Majorizing Measures Bound) Any sub-Gaussian process fulfills 

EsupX t < K^x{T,d) 

for a universal constant K . 

Lemma 3.2. (The Ellipsoid Theorem) Let the metric d be induced by the norm on l 2 (N). 
Then, for 

Qj- 

i>l 1 

with (ai)i>i G Z 2 (N) positive and non-increasing we have 

j2{E,d) < K sup aiVi (9) 
i>i 

for a universal constant K. 
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Using Holders inequality, the bound 
7i (T, d) . Finally, it holds that 



may be used to give an upper bound for 



Lemma 3.3. Consider a metric space (T,d) and a subset S of T. Then, 

^(S,d)<2^(T,d). 



3.2 Proof of Theorem 11.31 



Now, we show how the process of Theorem 11.31 can be rewritten such that the relevant 
set is an ellipsoid and how the bounds stated above can then be applied. To find rea- 
sonable results, however, we have to assume strong correlation among the covariables. 
By this, we mean that the columns of the corresponding matrices are not too different. 
Or, more precisely, that the columns regarded as vectors can be collectively enveloped 
by a small ellipsoid. 



At first, we state a well known fact: 

Proposition 3.1. Let {X t : t G T} be a stochastic process with an arbitrary index set 
T. Assume that the Esup tgT X t = Esup t6T (— X t ). Then, 

E sup \X t \ - E|JQ 1 < 2E sup X t 

for to G T arbitrary. 

Moreover, we set [j := and we denote by sconvA the symmetric convex hull of a set 
A. We are then prepared to give the proof of the theorem: 



Proof of Theorem \1.0A Setting 



T := M-sconv {y[, ...,y' v } 
r':=M -scow 



we obtain 



Esup 

6»eO 



i=i 



< E sup 

teT'xT" 



1=1 



Next we define aj 



2 ._ 4n 



• M 2 , (U'(t))i := t 2i -i and (Jl"(t))i := t 2i . Furthermore, 



^ t 2 



£:={teM 2n :)^< 1}. 



'^4 

i=i 1 



Then, 



E sup 



< Esup 

teE 



i=l 
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To simplify the notation, we define 

and we note that since g is a contraction 



d(t,t) :-- 



J2( gi (t) - gi {t)Y < \\t-t\\ 2 =:d 2 (t,t). 
\ i=1 



(10) 



Now, let S be a maximal subset of E such that d(t, t) > M for all t, t £ S,t ^ t. 
Consequently, (S, d 2 ) is a metric space and we have due to Cauchy-Schwarz' inequality 



sup 

t&E 



< y/nM + sup 

t&S 



i=l 



i=l 

With regard to Proposition 13.11 the quantity to calculate is 

n 

Esup^e^fT). 



tes 



i=l 



To bound this quantity, we apply Hoeffding's inequality, the contraction property ([10] 
and Lemma 13.11 to obtain for a universal constant K 

n 

Esup^e^t) < K'j 1 (S,d 2 ). 
tes^ 

Moreover, d%(t, 0) < ^Zi a i < 8n 2 M 2 , so that we arrive at (using Holders inequality) 



o V l0g KB(d 2 ,t,e)) 



de 



M 



1 



< / f° g n(B(d 2 ,t,e)) 



de 



ill 



de\ 2 / f°° 1 

o 6 ° g K B ( d 2,t,e)) 



de 



We stress, that the balls are with respect to the set S. Finally, 



1 



(I v l0g ^( 5 ( rf 2,^e)) 



de. 



Thus, the proof can be concluded using Lemma 13.21 and Lemma 13.31 



□ 
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4 Conclusion 

Classical entropy bounds have proved to be a simple and useful tool in many applications. 
However, Majorizing Measures are a priori more powerful in the treatment of empirical 
processes. They are known to outmatch the classical entropy bounds for unit balls of 
p-convex Banach spaces as index sets. While this is true, the unit ball of (Mf, \\ ■ ||i) 
is not p-convex. So far, we only found reasonable results with Majorizing Measures by 
invoking high correlation. The results were in this case independent of the dimension I, 
which is quite important since we often assume / ^> n. 
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